StreamCloud: An Elastic Parallel-Distributed Stream Processing Engine. (StreamCloud: un moteur de traitement de streams parallèle et distribué)

نویسنده

  • Vincenzo Gulisano
چکیده

In recent years, applications in domains such as telecommunications, network security or large scale sensor networks showed the limits of the traditional store-then-process paradigm. In this context, Stream Processing Engines emerged as a candidate solution for all these applications demanding for high processing capacity with low processing latency guarantees. With Stream Processing Engines, data streams are not persisted but rather processed on the fly, producing results continuously. Current Stream Processing Engines, either centralized or distributed, do not scale with the input load due to single-node bottlenecks. Moreover, they are based on static configurations that lead to either under or over-provisioning. This Ph.D. thesis proposes StreamCloud, an elastic paralleldistributed stream processing engine that enables for processing of large data stream volumes. StreamCloud minimizes the distribution and parallelization overhead introducing novel techniques that split queries into parallel subqueries and allocate them to independent sets of nodes. Moreover, StreamCloud elastic and dynamic load balancing protocols enable for effective adjustment of resources depending on the incoming load. Together with the parallelization and elasticity techniques, StreamCloud defines a novel fault tolerance protocol that introduces minimal overhead while providing fast recovery. StreamCloud has been fully implemented and evaluated using several real word applications such as fraud detection applications or network analysis applications. The evaluation, conducted using a cluster with more than 300 cores, demonstrates the large scalability, the elasticity and fault tolerance effectiveness of StreamCloud.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SVM incrémental, parallèle et distribué pour le traitement de grandes quantités de données

Résumé. Nous présentons un nouvel algorithme de SVM (Support Vector Machine ou Séparateur à Vaste Marge) linéaire et non-linéaire, parallèle et distribué permettant le traitement de grands ensembles de données dans un temps restreint sur du matériel standard. A partir de l’algorithme de NewtonGSVM proposé par Mangasarian, nous avons construit un algorithme incrémental, parallèle et distribué pe...

متن کامل

Approche préventive pour une gestion élastique du traitement parallèle et distribué de flux de données

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau...

متن کامل

Formal Semantics of Array-OL, a Domain Specific Language for Intensive Multidimensional Signal Processing

In several application domains (detection systems, telecommunications, video processing, etc.) the applications deal with multidimensional data. These applications are usually embedded and subjected to real-time and resource constraints. The challenge is thus to provide efficient implementations on parallel and distributed architectures. Array-OL has been designed specifically to handle this ki...

متن کامل

Interprétation linguistique de requêtes pour un moteur de questions réponses grand public

In this article we describe the use of Natural Language Processing platform to interpred user queries to be fed into a question-answering system. The advantage of this system is threefold: first, we are able to identify only those requests which correspond to factual questions for which the engine has a precise answer; second, error correction is semantically controlled, in order to avoid bad o...

متن کامل

Tetanus Study in Children - Introduction of Several Patients

A prop-03 de la tetanie de l'cnfant: presentation de quelques cas, reYue de la. litterature.  Parmi les diverses etiologies de la tetanie in­fantile, le rachitisme et la malabsorption sont les causes les plus frequentes en Iran.  La tetanie s'observe plus volontiers au debut du rachitisme et dans des stades avances. Un seul cas de tetanie fut obse:rve chez un enfant de trois mo:is, ancien pre...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012